Robust partially observable Markov decision process
نویسنده
چکیده
We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.
منابع مشابه
A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملRobust Person Guidance by Using Online POMDPs
The paper considers a guiding task in which a robot has to guide a person towards a destination. A robust operation requires to consider uncertain models on the person motion and intentions, as well as noise and occlusions in the sensors employed for the task. Partially Observable Markov Decision Processes (POMDPs) are used to model the task. The paper describes an enhancement on online POMDP s...
متن کاملUnmanned Aircraft Collision Avoidance Using Partially Observable Markov Decision Processes
Before unmanned aircraft can fly safely in civil airspace, robust airborne collision avoidance systems must be developed. Instead of hand-crafting a collision avoidance algorithm for every combination of sensor and aircraft configuration, this project investigates the automatic generation of collision avoidance logic given models of aircraft dynamics, sensor performance, and intruder behavior. ...
متن کاملGrasping POMDPs: Theory and Experiments
Abstract— We describe a method for planning under uncertainty for robotic manipulation of objects by partitioning the configuration space into a set of regions that are closed under compliant motions. These regions can be treated as states in a partially observable Markov decision process (POMDP), which can be solved to yield optimal control policies under uncertainty. We demonstrate the approa...
متن کاملNew Grid-Based Algorithms for Partially Observable Markov Decision Processes: Theory and Practice
We present two new algorithms for Partially Observable Markov Decision Processes (pomdps). The first algorithm is a general grid-based algorithm for pomdps with theoretical optimality guarantees. The other algorithm is for the subclass of problems known as Stochastic Shortest-Path problems in belief space. Both algorithms are optimal and robust with respect to a novel robustness criterion that ...
متن کامل